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Abstract 

We introduce the Pade-Z2 (PZ) stochastic estimator for calculating determinants 
and determinant ratios. The estimator is applied to the calculation of fermion determi- 
nants from the two ends of the Hybrid Monte Carlo trajectories with pseudofermions. 
Our results on the 8^ x 12 lattice with Wilson action show that the statistical errors 
from the stochastic estimator can be reduced by more than an order of magnitude by 
employing an unbiased variational subtraction scheme which utilizes the off-diagonal 
matrices from the hopping expansion. Having been able to reduce the error of the 
determinant ratios to about 20 % with a relatively small number of noise vectors, this 
may become a feasible algorithm for simulating dynamical fermions in full QCD. We 
also discuss the application to the density of states in Hamiltonian systems. 
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1 Introduction 



At present, lattice gauge Monte Carlo calculation is still the only viable and practical means 
of solving QCD and computing hadron masses and matrix elements non-perturbatively. As 
such, there is a perpetual need of sharpening the tools to tackle the numerically intensive 
aspects of the computation, especially those pertaining to dynamical fermions. 

The Euclidean functional integration formulation of the quantum field theory of gauge 
bosons and fermions has the generic partition function 



[dU][dilj] [dij]e 



'Sg-t(>Mll) 



where U is the gauge link variable and Sg the gauge part of the action. Given that the 
fermion part of the action is quadratic in the Grassmann numbers ip and ip, they can be 
formally integrated out to give a fermion determinant, i.e. 



J[dU]detM[U]e-^o. (2) 



Since numerically the computation of the fermion determinant is much more demanding a 
task than the updating of gauge links U, it is often approximated by a constant. This is known 
as the quenched approximation which has previously been interpreted as tantamount to 
neglecting the internal quark loops. Recently, Sexton and Weingarten ^ have advanced the 
view that it actually corresponds to the inclusion of the leading terms in the loop expansion 
which are commensurate with the size of loops in the gauge action. This leads to a shift 
in (3 or the bare coupling constant. Although a number of low-energy quantities, such as 
hadron masses 0, weak matrix elements 0, ^, and hadron structure [|, |^ are reproduced 
reasonably well (within 6% to 15% in many cases) in the quenched approximation, we know 
that the chiral log behaviors of these quantities H , the rj' mass, and the phase transition 
1^ depend crucially on the full inclusion of dynamical fermions. Thus, solving full QCD with 
dynamical quarks remains a desirable and challenging ultimate goal. 

In view of the perceived difficulty of calculating determinants accurately, all the existing 
working algorithms have avoided calculating them directly. Instead, pseudofermions |]10[ and 



local bosons [|lT| are introduced to bonsonize the determinantal effects. For example, the 
current state of the art algorithm - Hybrid Monte Carlo (HMC) [T^ transforms the partition 
function in Eq. to the following one for 2 degenerate flavors 

[df/][rf0][#*]e-^«-^*(^'^)"^ . (3) 

where is a pseudofermion variable which can be generated using a Gaussian heatbath. 
This gives rise to the pseudofermion force which acts over the course of molecular dynamics 
trajectories to update the gauge field and the fermion matrix M which in turn enables the 
updating of the field. 

On the other hand, it may be desirable to admit the determinantal effects directly without 
resorting to the superfiuous degrees of freedom from pseudofermions. This can be done in 
principle with the partition function in Eq. (0) rewritten as 

\dU]e~^^^^''^°^^, (4) 
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and TrlogM which reflects the dynamical fermions is taken as an additional part of the 
gauge action. It is shown in |jl3l that a HMC-like algorithm based on the partition function 
in Eq. (Q) is valid provided that Tr log M can be estimated without bias. However, the task 
at first sight appears daunting. First of all, one needs an efficient algorithm to calculate 
TrlogM. This is apparently much more intensive numerically than calculating (M^M)~^0 
in the pseudofermion approach. In addition, the demands on the accuracy of Tr log M are 
very stringent. Since the relative error of detM is the absolute error in TrlogM, a 20% 
error in det M for example would require calculating Tr log M (which is of the order N) to 
within 0.2. Luckily, the Monte Carlo updating involves only determinant ratios, and not the 
determinants themselves. One would expect that an accuracy of ~ 0.2 should be somewhat 
easier to achieve for the difference TrlogMi — TrlogM2 than for each term separately. 

We shall present in this manuscript an efficient stochastic algorithm to estimate Tr log M 
which has the potential of achieving the kind of accuracy (~ 0.2) in Trlog difference with 
relatively small (~ 500) number of noise vectors. This new algorithm invokes the Fade 
approximation for the log M and uses complex Z2 noise to estimate the trace, as introduced in 
Sec. 2. We have tested it by calculating the determinants and determinant ratios of fermion 
matrices from both ends of randomly chosen molecular dynamics trajectories generated by 
Hybrid Monte Carlo with pseudofermions. We also applied the method to Wilson fermions 
on an 8'^ x 12 lattice, and studied its dependence on the rank of the Fade expansion and the 
number of noise vectors. In Sec. 3 we introduce an unbiased variational subtraction scheme 
which is based on the subtraction of traceless terms in the hopping parameter expansion. We 
find that this can reduce the statistical error by an order of magnitude leading to an error 
in the range of 0.2 - 0.3 with 400 - 600 noise vectors. These results are presented in Sec. 
4. We should mention that there exist other stochastic estimators for determinants. Ref. 
uses the Chebyshev polynominal to expand logM"'"M and gaussian noise to estimate the 
trace. Ref. uses Z2 noise and the Riemann-Stieltjes integral to estimate the TrlogM. 
The subtraction scheme we introduce here is applicable to both of these approaches. A 
discussion of application to density of states in Hermitian Hamiltonian systems is presented 
in Sec. 5. Sec. 6 gives the conclusions and outlook. 



2 The basic Fade - Z2 Method 
2.1 Pade approximation 

The starting point for the current algorithm is the Fade approximation of the logarithm func- 
tion. The Fade approximant to \og{z) of order [K, K] at zq is a rational function N{z)/D{z) 
where deg N{z) = deg D{z) = K, whose value and first 2K derivatives agree with logz 
at the specified point zq. When the Fade approximant N{z)/D[z) is expressed in partial 
fractions, we obtain 

logz^6o + Ef^V (5) 
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whence it follows 



K 



log det M = Tr logM ^ 6oTrI + X! ^fc ' Tr(M + Cfel)"\ 



(6) 



k=l 



For the purpose of Monte Carlo updating, only the ratio of determinants is needed. In 
this case, the log of the determinant ratio detMi/detM2 is approximated as. 



logldetMi/detMs} = TrflogMi - logMs] 

^ 5]6fc(Tr(Mi + cJ)-i-Tr(M2 + cJ)-^ 



(7) 



where Mi and M2 are matrices at the beginning and end of an HMC trajectory for example. 

This approximation is accurate so long as the eigenvalues of the matrices Mi and M2 
all lie in the region in the complex plane where the Fade approximation is accurate. If we 
define e{z) to be the difference between the right and left-hand sides of Eq. (^, then the 
error in the approximation in Eq. (0) is 



^TrwM - IIe(A„), 



where {A„} are the eigenvalues of M. 
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Figure 1: e{z) for the Fade approximation of \ogz on the positive real axis for different 
orders of the Fade expansion at zq = 1.0. 
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The accuracy of the Pade approximation is graphically illustrated in Figures 1-6. In Fig. 
1, we plot e{z) for different orders of the Pade approximation on the positive real axis. We 
see that e{z) for the 5th order Pade reaches quickly to the order of 10~^ around the expansion 
point Zo = 1, whereas that of the 7th order does not reach 10~^ until z is smaller than 0.3 
and greater than 3. For the 11th order, the domain for which e{z) < 10~^ is extended to 
between 0.1 and 10. 
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Figure 2: The same as Fig. 1 with a larger domain of z. 

Fig. 2 shows the same plot with a larger domain of z where e{z) is larger when it is 
farther away from the expansion point. Note that in these figures e{l/z) = —t{z), i.e. the 
error function is antisymmetric under the transformation z 1/z (due to the corresponding 
antisymmetry of the logarithm). Hence, if the expansion point is suitably chosen near the 
'center' of the eigenvalue distribution, we shall get error cancellation. The coefficients of the 
Pade approximation bk and Ck in Eq. for 5th, 7th, 9th, and 11th orders used to produce 
Figs. 1 and 2 are tabulated in Table 1. 

In Figs. 1 and 2, the Pade expansion point is chosen as zq = 1. The error functions ezo{z) 
for other expansion points are identical, modulo a change of scale. This is due to the fact 
that logz = \og{z/zo) +log(2;o); and hence the Pade approximation Pzo{z) for log 2; around 
Zq is equal to 

P,„(-2) =^1(^/^0) +log(^o). (9) 

It follows that 

ez,{z) = Piiz/zo) - \ogiz/zo) = eiz/zo). (10) 

We see from these figures that as long as one has some notion about the domain of the 
eigenvalues of the matrix M and the desirable level of accuracy one needs for TrlogM, one 
can decide on an appropriate order of the Pade approximation and the expansion point zo. 
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Table 1: The coefficients bk and Ck of Pade expansions P[K, K] for log^ at zq = 1.0. 



k 


bk 


Ck 


bk 


Ck 


P[5,5] 
{bo = 4.566666667) 


P[7,7] 
{bo = 5.185714286) 


1 


0.130408172495391 


0.0492189449629343 


0.06816753269 


0.02611045152 


2 


0.404437841802115 


0.299993435237384 


0.1844456100 


0.1484146919 


3 


1.13777777850000 


1.00000000000000 


0.3863893344 


0.4226317870 


4 


4.49394268525114 


3.33340630084376 


0.8359183641 


1.0000000000 


5 


53.8334305460671 


20.3173813294693 


2.163220583 


2.3661258591 


6 






8.373644726 


6.737877409 


7 






99.98821380 


38.29883980 


P[9,9] 
{bo = 5.65793650793651) 


P[ll,ll] 
{bo = 6.03975468975469) 


1 


.041962745788105575 


.01617742288114674 


.02845031368729848 


.01100547288317344 


2 


.10717746225992 


.08930616263474539 


.07053085455625973 


.05984825317707202 


3 


.20024123115789 


.2396401470008922 


.1244662250449701 


.1559677956367871 


4 


.35622568741554 


.5102849384064862 


.2021047007456418 


.3165723758668845 


5 


.6604787100025195 


1.0000000000000000 


.3261128629248666 


.5753698412085129 


6 


1.36804295333940783 


1.959689429836578 


.5458501735606061 


1.000000000000000 


7 


3.48685872889035930 


4.172923495979481 


.9850850802286254 


1.738012541463052 


8 


13.4381848942707415 


11.9743554641257 


2.016649317776546 


3.158835312972759 


9 


160.340827693254954 


61.81454285684810 


5.116602053912441 


6.411580005456822 


10 






19.69138119799427 


16.70892543916556 


11 






234.8927672230579 


90.86388296216937 



The Pade approximation of the logarithm is not limited to the real axis. It applies equally 
well to the complex plane, except near the branch cut. Pade approximation of the logarithm 
about a positive real zo corresponds to a branch cut along the negative real axis, and the 
poles of the Pade functions all lie on the negative real axis. Using coefficients from Table 1 
which are obtained from expansion about zq = 1, we plot the Pade approximated logz along 
the unit circle as a function of arg(z). The real part as shown in Fig. 3 reveals the pole at 
z = —1 for all the Pade functions with different orders. Fig. 4 shows the imaginary part. 
The imaginary part of the log function has a discontinuity from — vr to tt across the negative 
real axis, while the Pade approximations have discrete poles. Thus, the Pade approximation 
fails near the branch cut. Nevertheless, it works for cases away from the branch cut. In 
some cases, it maybe desirable to place the branch cut in a different location, say along the 
ray aig{z)=6. This is easily done by choosing the expansion point zq as e~*^, as can be seen 
from Eq. (^). As long as there are no eigenvalues along the ray a.Tg{z)=6 for the matrix M, 
one can apply the Pade approximation outlined above to calculate determinants which are 
negative or complex. 

It is worth mentioning that the Pade approximation furnishes a much more accurate 
global approximation to the logarithm than that obtained from the "Green function" method [jl6 
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Real part of Pade[K,K](z) for ABS(z)=1 .0 
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Figure 3: The real part of Fade approximated log^ along a circle of radius 1 as a function of 
arg(z) for several different orders of the Fads expansion. The expansion point is at zq = 1. 
Imaginary part of Pade[K,K](z) for ABS(z)=1 .0 
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Figure 4: The imaginary part of Fade approximated logz along a circle of radius 1 as a 
function of arg(z) for several different orders of the Fads expansion.The expansion point is 
at zo — 1. 
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Real part of Pade[K,K] for ABS(z)=1 0.0 
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Figure 5: The same as in Fig. 3 for the circle of radius 10. 
Imaginary part of Pade[K,K] for ABS(z)=10.0 
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Figure 6: The same as in Fig. 4 for the circle of radius 10. 
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Figure 7: The Green function approximation to log with 1 1 terms and several e are compared 
to the log function itself. 



The Green function approximation makes use of the fact that 



lim 



1 



TT e-*o A — Ao — ^e 



(5(A-Ao), 



whence it follows 



log X-p{X)dX 
~ I^logAi • — Im / 



P(A) 



-dX 



= i^logAi- AAIm[Tr(M-Ai-ze)-^ 

where Aj are evenly spaced on [Xmin, ^max], and e is a small parameter. 
This approximation may be rewritten as 

/ logX- p{X)dX^ fe{X)p{X)dX, 

e log Xi 



(11) 

(12) 
(13) 
(14) 



where 



TT ^ (A - A,)2 + e2 



(15) 



(16) 
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Figure 7 shows that the function /e(A) with eleven terms and several e furnishes a very 
poor approximation to the logarithm on the interval 1/60 < A < 60. By contrast, Figure 2 
shows that an eleven-term Fade expansion on the same interval approximates the logarithm 
to ±.0005 at worst, and to much higher accuracy on most of the interval. Also, the Green 
function method is only applicable if the eigenvalues of the matrix are real, while the Fade 
approximation also holds for matrices with complex eigenvalues. 



2.2 Complex Z2 noise trace estimation 

Exact computation of the trace inverse for N x N matrices is very time consuming for 
matrices of size ~ 10^. However, the complex Z2 noise method has been shown to provide 
an efficient stochastic estimation of the trace |T^, ^, In fact, it has been proved to be an 
optimal choice for the noise, producing a minimum variance ^ . 



The complex Z2 noise estimator can be briefly described as follows |T^, . We construct 



L noise vectors 77^,77^ 



]^ where t]^ = ''725 ''^s; ' ' ' jVn}^^ follows. Each element 77^ 



takes one of the four values {±1, ±«} chosen independently with equal probability. It follows 
from the statistics of r/^ that 



E[< >]^E[-J2vi]=0, 
^3=1 



(17) 



The vectors can be used to construct an unbiased estimator for the trace inverse of a given 
matrix M as follows: 



E[< ri^M-^T] >] 



1 L TV 

j=l m,n=l 
1 L N 

jT.iT.M-i.)EKWn]- 

^ j=l n=l 



N 



j m^n 



N ^ 1 

EM-i + (EM-;j[-EC'^^ 

Tr M-^ 



The variance of the estimator is shown to be 19 



M 



= Var[< T]^M.-^T] >] = E [1 < r^^M"^// > -Tr M"^] 



N 



N 



- V (M 

/ J m,n\ 1 



1 y 



1 |2 

m,n I 



The stochastic error of the complex Z2 noise estimate results only from the off-diagonal 
entries of the inverse matrix (the same is true for Z„ noise for any n). However, other noises 
(such as gaussian) have additional errors arising from diagonal entries. This is why the Z2 
noise has minimum variance. For example, it has been demonstrated on a 16^ x 24 lattice 
with (3 = 6.0 and k = 0.148 for the Wilson action that the Z2 noise standard deviation is 
smaller than that of the gaussian noise by a factor of 1.54 [0. 
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Applying the complex Z2 estimator to the expression for the determinant ratio in Eq. 
(0), we find 

^ 6,{Tr(Mi + Cfc)-i - Tr(M2 + Ck)-'} 

k 

~ ^ EEM''^[(^i + ck)-' - (M2 + ckr'w 

k j 

= Tj:j:witi'-t2% (18) 
^ j k=i 



where '-^ = (Mj + Cfcl) "'^r]-' are the solutions of 



(Mi + cJ)e^' = (19) 
(M2 + cJ)e2''' = A;,j = l,2,-- - . (20) 



Since Mj + c^I are shifted matrices with constant diagonal matrix elements, Eqs. (^9|) and 
(^) can be solved collectively for all values of Ck within one iterative process by several 
algorithms, including the Quasi-Minimum Residual (QMR) |2T|], Multiple-Mass Minimum 



Residual (M^ R) [|2], and GMRES|2|]. We have adopted the M^ R algorithm, which has 



been shown to be about 2 times faster than the conjugate gradient algorithm, and the 



overhead for the multiple Ck is only 8% The only price to pay is memory: for each Ck, 
a vector of the solution needs to be stored. Furthermore, one notices that > in Table 1. 
This improves the conditioning of (M -|- c^I) since the eigenvalues of M have positive real 
parts. Hence, we expect faster convergence for column inversions for Eqs. ([I9|) and (0). 

In HMC, the difference between the Tr log M at the beginning and the end of the molecu- 
lar dynamics trajectory, i.e. Tr logMi — Tr logM2 is ~ Thus the standard deviation a 
encountered in the estimation of Tr log Mi — Tr log M2 from the Z2 noise is of the same order 
as the estimated value itself, i.e. ~ 0{1). However, this is not good enough. In practice, 
we find that one needs ~ 319, 000 noise vectors to reduce the stochastic error to 0.2. In the 
next section, we describe a method which significantly reduces the stochastic error. 



3 Improved PZ Estimation with Unbiased Subtraction 

In order to reduce the variance of the estimate, we introduce a suitably chosen set of traceless 
N X N matrices Q^^-*, i.e. which satisfy J2n=i Qnli = 0, p = 1 ■ ■ ■ P. The expected value and 
variance for the modified estimator < r]^(Wl~^ — Ylp=i ^pQ^^^)v > are given by 

E[< rj\M-' - {2 ApQ^"^))r7 >] = Tr , (21) 

p=i 

AM(A)=Var[<r^t(M-i-f]ApQ(P))r/>] = ^ \^^]n - KQ^,?.]n)\\ (22) 

p=l m^n p=l 

for any values of the real parameters Xp. In other words, introducing the matrices Q'-''^ 
into the estimator produces no bias, but may reduce the error bars if the Q^^^ are chosen 
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judiciously. Further, Aj, may be varied at will to achieve a minimum variance estimate: this 

corresponds to a least-squares fit to the function ?7'''M~^?7 sampled at points ?7j, j = 1 ■ ■ ■ L, 
using the fitting functions ^1 , rj'^ Q^^^ rjX , p — 1 ■ ■ ■ P. Making the definition Q*^°) = I, the 
usual least-square equations then yield 



A 
A 



C 



pq 



an 



C ^ a, where 
(Ao, Ai, • • • , Ap)^, 

L 
L 



and the trace estimate is given by ■ Aq. 

We now turn to the question of choosing suitable traceless matrices Q*^'''-' to use in the 
modified estimator. One possibihty for the Wilson fermion matrix M = I — is suggested 
by the hopping parameter — k expansion of the inverse matrix, 



(M + Cfel)-' = 



M + c,I (l + c,)(I- (^D) 



+ 



K, 



1 + Cfc (1 + Cfc)' 



K 



(1 + Ck) 



rD3 + 



(23) 



This suggests choosing the matrices Q*^^^ from among those matrices in the hopping param- 
eter expansion which are traceless: 



Q(2r+1 



K 



.2 



D, 



K 



(1 + Ck) 

(1 + Cfc) 



(D^-TrD^), 



^D^ -TrD^), 



(1 + Cfc) 



2r+2 



D 



2r+l 



r = 3,4,5,--- . 



It may be verified that all of these matrices are traceless. In principle, one can include all 
the even powers which entails the explicit calculation of all the allowed loops in TrD^^. In 
this manuscript we have only included Q^-^^ Q^^), and Q^^^+i). Note that Tr in Q^'') can 
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be evaluated from 

IV = -32 Tr Up, (24) 

p 

where Tr Up is the plaquette and - 32 comes from the trace of the product of 1 ± 7^ in the 
the Wilson action. Similarly, Tr in Q*-^-* can be evaluated from 3 classes of 6-link loops, 

Ti{D'} = -128 Ul, - 64 5: Ul, - 64 J] Ul„ (25) 

LiGB. L2GP L36C 

where Li stands for the sum over rectangles, L2 over parallelograms, and L3 over chairs. 
We then set, 

Q(A) = AiQ(i) + AaQ^') + AgQ^^^^ + A4Q(') + AgQ^'^ + AeQ^'^ + ■ ■ ■ + A2.+iQ('''+') + ■ ■ ■ , (26) 

and perform the variation process to get an optimal choice of { Ai, A2, ■ ■ ■ , X2r+i, ■ ■ 'jopt- The 
additional computational cost incurred by the modified estimator is P additional matrix 
vector multiplications per noise vector. Since P is small (~ 9), this overhead is essentially 
negligible compared to solving Eqs. ( pj]) and (|20|) . 



In actual practice, we generate L complex Z2 noise vectors, and obtain basic PZ estimates 
using the M^R matrix inversion algorithm. The auxiliary data used in the improved PZ 
estimates may be computed via a few matrix-vector multiplications. 

• Unimproved estimates: {Oi, O2, ■ ■ ■ , Ol}, with Oj = bk^i^^^kj, j = 1, 2, ■ ■ ■ , L, 



auxihary data set: {d\'\ Di'\ ■ ■ ■ , D^^}, with Z^f = Ek (^^(r/^'^Dr^^), 
2"'^ auxiliary data set: {^f \ ■ ■ ■ , ^if^}, with uf = Ek (^^^^(^■'^D^r^J) 

(3) 



3^^^ auxihary data set: {Dr , D)^\ • ■ • , D^^}, with Df^ = Ej j^rl^iv^^'D^V^ 

(l + Ci) 



4*^^ auxihary data set: {D^f^ , Di^\---, D^^^}, with = Efc jff^iv^^'^^V^ - TrD^ 
5"" auxihary data set: {d[^\ of, ■ ■ ■ , of}, with of = Ei r^^iv^^'D^V^), 



6*^ auxihary data set: {vf, Df\ ■ ■ ■ , of}, with = Ek jTT^iv^^^^V^ - TrD« 



. Higher odd-terms: {Df^^^ ■ ■ ■ , I^f +^)}, ^ith of'^'^ = j^^^iv'^r^^'-'W), 

Using these data, a least squares fit is performed to yield a set of {Aq, Ai, A2, A3, A4, A5, Ag, \{2r+i)}opt, 
which minimizes the variance Eq. ( p2D of the improved estimator over the {L} noise vectors. 
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Table 2: Unimproved and improved PZ estimates for log[detMi] with 100 complex Z2 noise 
vectors, k, = 0.150. 



P[K,K]{z) 


K = 


5 


7 


9 


11 


zo = 0.1 


Original: 


473(10) 


774(10) 


796(10) 


798(10) 




Improved: 


487.25(62) 


788.17(62) 


810.83(62) 


812.33(62) 


zo = 1.0 


Original: 


798(10) 


798(10) 


798(10) 


799(10) 




Improved: 


812.60(62) 


812.37(62) 


812.36(62) 


812.37(62) 



4 Computations of determinants and determinant ra- 
tios 

Our numerical computations were carried out with the Wilson action on the 8'^ x 12 (A^ 
= 73728) lattice with P = 5.6. We use the HMC with pseudofermions to generate gauge 
configurations. With a cold start, we obtain the fermion matrix Mi after the plaquette 
becomes stable. The trajectories are traced with r = 0.01 and 30 molecular dynamics steps 
using K = 0.150. M2 is then obtained from Mi by an accepted trajectory run. Hence Mi 
and M2 differ by a continuum perturbation, and log[detMi/ det M2] ~ C'(l)- 

We first calculate log det Mi with different orders of Pade expansion around zq = 0.1 
and Zq = 1.0. We see from Table 2 that the 5th order Pade does not give the same answer 
for two different expansion points, suggesting that its accuracy is not sufficient for the range 
of eigenvalues of Mi. Whereas, the 11th order Pade gives the same answer within errors. 
Thus, we shall choose P[ll,ll](z) with zq = 0.1 to perform the calculations from this point 
on. 

Table 3 shows the optimal choice of parameters Aj, i = 1,5 with different subtraction 
sets and various Z2 noise lengths. The fact that Aj ~ 1.0, i = 1,2,3,5 and A4 ~ gives 
further evidence that the fluctuations due to the complex Z2 noise are indeed introduced by 
the off-diagonal matrix elements. 

In Tables 4 and 5, we give the results of improved estimations for Tr log Mi and Tr log M2 
respectively. We see that the variational technique described above can reduce the data 
fluctuations by more than an order of magnitude. For example, the unimproved error 6q = 
5.54 in Table 4 for 400 Z2 noises is reduced to = 0.15 for the subtraction which includes 
up to the Q^^ matrix. This is 37 times smaller. Comparing the central values in the last row 
(i.e. the ll*'' order improved) with that of unimproved estimate with 10,000 Z2 noises, we see 
that they are the same within errors. This verifles that the variational subtraction scheme 
that we employed does not introduce biased errors. The improved estimates of TrlogMi 
from 50 Z2 noises with errors 6r from Table 4 are plotted in comparison with the central 
value of the unimproved estimate from 10,000 noises in Fig. 8. 

Our unimproved results have the similar size errors as those obtained by the Chebyshev 
polynominal expansion of logM^^M |[l|, thus one can similarly improve its estimation with 
the variational subtraction scheme introduced here. 
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Table 3: The variational parameters {X}opt in Eq- ( p6D for different subtraction data set 
listed above and various Z2 noise length L for Mi. 



Improved 


Z2 length 


Ai 


A2 


A3 


A4 


A5 




L=200 


0.960 


— 


— 


— 


— 


L=600 


0.968 


— 


— 


— 


— 






L=200 


0.960 


0.919 


— 


— 


— 




L=600 


0.972 


0.982 


— 


— 


— 


3rd. 


L=200 


0.968 


0.942 


0.908 


— 


— 




L=600 


0.972 


0.969 


0.919 


— 


— 




L=200 


0.973 


0.921 


0.903 


0.092 


— 




L=600 


0.976 


0.924 


0.967 


0.089 






L=200 


0.990 


0.926 


0.908 


0.089 


1.27 




L=600 


0.992 


0.924 


0.910 


0.087 


1.26 


11"^: 


L = 50 


0.998 


0.913 


0.968 


0.088 


1.06 




L = 100 


0.992 


0.890 


0.953 


0.086 


1.48 




L = 150 


0.995 


0.886 


0.939 


0.086 


1.18 




L = 200 


0.999 


0.923 


0.951 


0.086 


1.09 




L = 300 


0.997 


0.924 


0.934 


0.087 


1.06 




L = 400 


0.998 


0.925 


0.959 


0.088 


1.07 




L = 600 


1.001 


0.923 


0.964 


0.085 


1.00 



Table 4: Central values for improved stochastic estimation of log[det Mi] and rth-order 
improved Jackknife errors 6r are given for different numbers of Z2 noise vectors, n is 0.150 
in this case. 



#Z2 


50 


100 


200 


400 


600 


800 


1000 


3000 


10000 




802.98 


797.98 


810.97 


816.78 


815.89 


813.10 


816.53 


813.15 


812.81 


So 


±14.0 


±9.81 


±7.95 


±5.54 


±4.47 


±3.83 


±3.41 


±1.97 


±1.08 




807.89 


811.21 


814.13 


815.11 


814.01 


814.62 


814.97 






Si 


±4.65 


±3.28 


±2.48 


±1.84 


±1.50 


±1.29 


±1.12 






2nd 


813.03 


812.50 


811.99 


812.86 


811.87 


812.89 


813.04 






S2 


±2.46 


±1.65 


±1.34 


±1.01 


±0.83 


±0.72 


±0.64 






yd 


812.07 


812.01 


811.79 


812.44 


812.18 


812.99 


813.03 






S3 


±1.88 


±1.31 


±1.01 


±0.74 


±0.58 


±0.51 


±0.44 








812.28 


812.52 


812.57 


812.86 


812.85 


813.25 


813.40 






Sa 


±1.20 


±0.94 


±0.68 


±0.48 


±0.39 


±0.35 


±0.30 








813.50 


813.07 


813.36 


813.70 


813.47 


813.54 


813.50 






S5 


±0.82 


±0.62 


±0.47 


±0.34 


±0.29 


±0.25 


±0.22 






QtS 


813.54 


813.23 


813.22 


813.28 


813.20 


813.37 


813.26 






Se 


±0.67 


±0.49 


±0.35 


±0.25 


±0.21 


±0.18 


±0.16 








814.18 


813.74 


813.44 


813.42 


813.39 










Si 


±0.44 


±0.36 


±0.26 


±0.19 


±0.16 












813.77 


813.62 


813.49 


813.40 


813.43 










S9 


±0.40 


±0.30 


±0.22 


±0.16 


±0.14 












813.54 


813.41 


813.45 


813.44 


813.43 










Sii 


±0.38 


±0.27 


±0.21 


±0.15 


±0.13 
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Table 5: The same as in Table 4 for log[det Ma]. 



#Z2= 


50 


100 


200 


400 


600 


800 


1000 


3000 


10000 




788.96 


793.87 


809.08 


813.03 


813.06 


811.14 


815.29 


811.48 


809.54 


So 


±15.8 


±10.7 


±8.27 


±5.89 


±4.66 


±3.94 


±3.46 


±1.99 


±1.09 




808.94 


814.00 


811.94 


811.85 


811.73 


811.77 


812.22 






Si 


±5.00 


±3.69 


±2.69 


±1.91 


±1.53 


±1.39 


±1.25 








810.75 


811.11 


810.29 


810.21 


810.15 


810.43 


811.05 






S2 


±2.67 


±1.94 


±1.40 


±1.03 


±0.84 


±0.73 


±0.66 






3rd 


806.65 


808.13 


809.28 


809.58 


810.06 


810.56 


810.52 






^3 


±1.79 


±1.36 


±1.01 


±0.69 


±0.56 


±0.48 


±0.43 






^th 


807.80 


808.89 


810.02 


809.76 


810.06 


810.32 


810.40 






S4 


±1.17 


±0.97 


±0.69 


±0.49 


±0.39 


±0.33 


±0.30 








809.92 


809.80 


810.55 


810.68 


810.82 


810.85 


810.79 






s. 


±0.89 


±0.69 


±0.52 


±0.37 


±0.29 


±0.25 


±0.22 






<jts 


810.47 


809.93 


810.65 


810.58 


810.74 










S7 


±0.75 


±0.62 


±0.46 


±0.32 


±0.25 










Qth 


809.04 


809.90 


810.70 


810.71 


810.80 










S9 


±0.66 


±0.59 


±0.43 


±0.29 


±0.23 










nth 


810.02 


809.73 


810.65 


810.71 


810.81 










Sn 


±0.65 


±0.56 


±0.42 


±0.29 


±0.23 
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Improved estimates for Tr log M_1 



Improved at kappa=.150 
Unimproved: 812.8 +/- 1.1 



810 



805 - 



800 u ' ' ' ' ' ' 1 

2 4 6 8 10 12 

order of subtraction 

Figure 8: The improved PZ estimate of Tr log Mi with 50 noises as a function of the order of 
subtraction and compared to that of unimproved estimate with 10,000 noises. The dashed 
hues are drawn with a distance of 1 o" away from the central value of the unimproved estimate. 

Results for Tr[logMi — logM2] are shown in Table 6. We see that again the errors are 
reduced by a factor ~ 34. for 50 Z2 noise vectors is even smaller than the unimproved error 
60 with L = 10,000. To achieve the same level of accuracy for the unimproved estimation, 
it would require ~ 65,955 noise vectors. This is 1,319 times more than the 50 noise case 
which employs subtraction. Again, to show that the subtraction does not introduce biased 
errors, we plot in Fig. 9 the improved PZ estimates of Tr[logMi — logM2] with errors from 
50 noise vectors as a function of the order of subtraction and verify that they agree with 
that of the unimproved estimate with 10,000 noises. 

As for the quark mass dependence, we expect that the error will go up as the quark mass 
becomes smaller. Indeed, we show in Table 7 the results of Tr[logMi — logM2] for k = 0.154 
that the errors are larger. The PZ estimates and their errors in Table 7 for k = 0.154 are 
similarly plotted as a function of the order of subtraction in Fig. 10. 

We have also generated a sequence of configurations through HMC updating with pseudofermions. 
In Table 8 we list the change of the gauge, the pseudofermion, and the kinetic energy parts 
of the action from 10 molecular dynamics trajectories. The total change in energy AH is 
~ 0(1). Also listed are the change of TrlogM, i.e. A(TrlogM) = TrlogMi — TrlogM2. 
It is somewhat surprising to see that the absolute values of A (TrlogM) are an order of 
magnitude smaller that those of AUpseudo (the pseudofermion part of the action), and their 
signs can be different. This may be related to the observation that it takes very long to 
decorrelate the global topological charge in HMC with pseudofermions ||25|. This will be 
investigated further in the future. 
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Improved estimates for Tr[log M_1 - log M_2] 



15 



10 



5 







< 




Improved at kappa=. 150 ^ — 
Unimproved: 3.3 +/- 1.1 








" J 1 5 i i 




< 


• 





2 4 6 8 10 12 

order of subtraction 

Figure 9: The same as in Fig. 8 for Tr[logMi — logM2] with k = 0.150. 



Improved estimates for Tr[log M_1 - log M_2] 
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Unimproved: 4.3 +/- 1 .2 
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Figure 10: The same as in Fig. 9 for Tr[logMi - logMs] with k = 0.154. 
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Table 6: The same as in Table 4 for log[det Mi / det Ms]. 



L 


50 


100 


200 


400 


600 


800 


1000 


3000 


10000 


nth 
U 


1 4 n 




1 on 
i.yu 


0. 1 




i.yu 


1 94 


l.Kj 1 


'? 98 


So 


±15.4 


±11.5 


±8.20 


±5.59 


±4.68 


±4.01 


±3.54 


±2.06 


±1.13 




4.46 


2.39 


2.30 


3.23 


2.24 


2.85 


2.75 


3.16 




5i 


±4.40 


±3.56 


±2.75 


±1.99 


±1.67 


±1.45 


±1.31 


±0.75 






1.81 


1.49 


1.72 


2.62 


1.75 


2.45 


1.99 


3.05 




52 


±2.43 


±1.98 


±1.49 


±1.11 


±0.95 


±0.82 


±0.73 


±0.42 




3rd 


3.85 


2.72 


2.45 


2.78 


2.16 


2.43 


2.48 


3.04 






±2.04 


±1.45 


±1.05 


±0.76 


±0.63 


±0.54 


±0.48 


±0.27 






3.41 


2.77 


2.51 


3.04 


2.81 


2.92 


2.99 








±1.34 


±0.97 


±0.70 


±0.51 


±0.42 


±0.36 


±0.33 








2.84 


2.84 


2.75 


2.95 


2.62 


2.68 


2.70 






k 


±0.91 


±0.65 


±0.51 


±0.38 


±0.32 


±0.27 


±0.24 








2.75 


2.80 


2.65 


2.65 


2.49 


2.54 


2.51 






Se 


±0.74 


±0.52 


±0.39 


±0.29 


±0.24 


±0.20 


±0.18 






<jts 


3.18 


3.27 


2.77 


2.90 


2.77 










(h 


±0.51 


±0.35 


±0.27 


±0.21 


±0.18 












3.15 


3.39 


2.78 


2.81 


2.76 










^9 


±0.47 


±0.32 


±0.25 


±0.19 


±0.16 










nth 


3.04 


3.33 


2.76 


2.83 


2.72 












±0.44 


±0.29 


±0.23 


±0.17 


±0.14 
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Table 7: Same as Table 6 for the log[det Mi/det M2] at = .154 instead of = .150. 



Zo * 


50 


100 


200 


400 


600 


800 


1000 


3000 


10000 


rdh 
U 




Z.KJKJ 


9 '?9 




0.0^ 


9 fiS 
z.uo 


1 Q9 


0.00 






±16.3 


±12.0 


±8.53 


±5.83 


±4.86 


±4.18 


±3.68 


±2.15 


±1.19 




-0.45 


-1.60 


3.56 


4.67 


3.27 


4.11 


4.04 






5l 


±5.09 


±3.88 


±2.96 


±2.16 


±1.79 


±1.55 


±1.39 






2st 


3.54 


3.21 


3.17 


4.14 


2.88 


3.70 


3.32 






S2 


±2.81 


±2.27 


±1.66 


±1.26 


±1.08 


±0.93 


±0.69 






3rd 


6.27 


5.07 


3.90 


4.32 


3.58 


3.84 


3.94 








±2.30 


±1.65 


±1.20 


±0.89 


±0.74 


±0.64 


±0.57 








5.49 


5.22 


4.28 


4.73 


4.46 


4.49 


4.49 






^4 


±1.58 


±1.13 


±0.54 


±0.64 


±0.52 


±0.45 


±0.41 








4.55 


4.72 


4.30 


4.35 


4.01 


4.10 


4.08 






k 


±1.07 


±0.77 


±0.63 


±0.49 


±0.41 


±0.35 


±0.32 








4.51 


4.72 


4.12 


4.03 


3.83 


3.91 


3.88 








±0.91 


±0.68 


±0.54 


±0.42 


±0.34 


±0.29 


±0.26 






<jts 


5.13 


5.52 


4.35 


4.43 


4.24 


4.16 


4.02 






<h 


±0.77 


±0.55 


±0.11 


±0.35 


±0.28 


±0.21 


±0.22 






Qth 


4.98 


5.60 


4.33 


4.27 


4.21 


4.16 


4.08 








±0.67 


±0.52 


±0.43 


±0.32 


±0.26 


±0.22 


±0.20 






nth 


4.80 


5.60 


4.32 


4.31 


4.19 


4.21 


4.13 








±0.64 


±0.49 


±0.42 


±0.31 


±0.25 


±0.21 


±0.19 
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Table 8: A breakdown of the energy change AH in 10 molecular dynamics trajectories in 
terms of the change in gauge action (At/p^a^), pseudofermion action {A.Upseudo)-i ^-nd kinetic 
energy (Att^). Also listed are the estimates of A(TrlogM) = TrlogMi — TrlogM2 using 
600 Z2 noises with subtraction. The k is 0.150 in this case. 



vjri vj Li u 




/\ rf / nl n — TVP^/J 1 


AfTr lop-Ml 

L — 1 1 J. / J.VX 1 


JT dli 1 . 


ATI , 


1 88 9'?8 




A/f A/f 


ATT 

'-^'^ pseudo 


iO.UOU 


-U.Zil .z^ 1 






179 "114 






ATI 1 


^0 / .uuu 




M M 


ATI 

'-^'^ pseudo 


iO.OiO 


-O.0o\^.ZD j 




Att^ 

^-i /I 






Pfl IT X ■ 
± dli . 


ATI , 


-1 90 857 






ATI 

^—^^ pseudo 


-o.ouo 








1 97 'VVI 




Pair A- 


ATI , 


-40 8fi9 




A/T A/T 
M4, M5 


ATI 

'-^'-^ pseudo 








Att^ 

i—X/i 


fiQ 085 




1 dli U . 


ATI , 


-1 10 fi74 




A//" A//" 


ATI 

'-^'^ pseudo 


70 89 


Ofi/' 94"! 






181 c;i 

iOi.OiO 




JT dll U. 


ATI , 

'-^^plaq 


941 814 




Me, My 




67.498 


-3.69(.25) 




Att^ 


-309.213 




Pair 7: 


AC/p/aq 


82.339 




M7,M8 


AUpseudo 


-70.015 


-0.98(.24) 




Att^ 


-12.315 




Pair 8: 


AUplaq 


-692.873 




M8,M9 


A Upseudo 


-27.643 


6.51(.24) 




Att^ 


720.243 




Pair 9: 


AC/piaq 


260.435 




Mg, Mio 




2.186 


-1.14(.25) 




Att^ 


-262.522 




Pair 10: 


A?7p;ag 


-613.121 




Mio,Mn 




110.997 


6.40(.24) 




A^2 


501.916 




Ex. Pair: 


Ml, Mil 




5.85(.30) 
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5 Application to global density of states for Hermitian 
Hamiltonian systems 



The density of states p{z) for a Hamiltonian system with Hamiltonian matrix H is 

1 ^ 

P(^) = l^E^(^-^n)' (27) 

n=l 

where {A„} are the eigenvalues of H. 

In reference |T^, p{z) for real z is calculated for Hermitian H as follows: 

p{z) = - lim ImTr(H - {z + ze)l)-\ (28) 

Choosing a small e in Equation ( |28D yields a smoothed version of p{z). 

We have shown above that the trace in Eq. (^) may be estimated for several different 
values of z simultaneously using complex Z2 noise and the M^R algorithm. Thus, we may 
estimate the global density of states for a Hermitian matrix H at essentially the same com- 
putational cost (modulo additional memory) as estimating the local density of states at a 
single point. 



6 Conclusion and summary of advantages of the PZ 
algorithm 

The PZ method takes advantage of proven, effective numerical approximation techniques. 
The advantages of the PZ method are summarized as follows: 

• Pade approximation uses rational functions, which are known to be very efficient in 
the uniform approximation of analytic functions. In finding determinant ratios, the 
Pade approximation to the logarithm only needs to be accurate on the region in the 
complex plane where the spectra of Mi and M2 differ. 

• The complex Z2 random vectors have been shown to be superior to the gaussian I]! 

18] noise in computing traces of inverse matrices. 



The PZ method also takes advantage of the recently developed M^R algorithm to cal- 
culate all terms in the Pade expansion in Eq. (0) in essentially the same computational 
time required to calculate a single term, albeit with additional memory (one length N 
vector for each additional term). Hence, a higher order Pade expansion requires more 
memory, but essentially the same computation time (apart from the matrix condition- 
ing effects to be mentioned below). 

The entire method can be applied to non-Hermitian matrices: so determinants of non- 
Hermitian matrices may also be found directly, without recourse to the Hermitian 
matrix M^M. Negative and complex determinants can also be calculated in principle. 
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The Cj's in the Pade expansion in Eq. turn out to be real and positive, which 
improves the conditioning of the matrices M + Cjl and hence expedites the column 



inversions in Eqs. (19, ^0|). However, this effect diminishes for higher order Pade 



approximations, because the minimum Cj decreases as the order increases. 

The PZ method also holds promise of being useful in the case where different quark 
flavors are present, in which case it is necessary to compute multiple determinant ratios 
for matrices with different but constant diagonal terms. Using the M^R algorithm, this 
takes essentially the same computation time as a single determinant ratio. 

The unbiased variational subtraction scheme works quite well in reducing the stochastic 
error from the Z2 noise. The principle is general enough to be applied to other cases 
with stochastic estimates. 



In conclusion, we have demonstrated the efficiency of the Pade - Z2 algorithm in estimat- 
ing determinants and determinant ratios to high accuracy for lattice QCD. It is certainly 
applicable to other systems with large sparse matrices. For example, we have been able 
to reduce the error of the determinant ratio from 559% to 17% with the unbiased subtrac- 
tion scheme and a relatively small (~ 400) number of the noise vectors (see Table 6). It is 
rather encouraging as far as the the feasibility of using this algorithm to simulate dynamical 
fermions in full QCD is concerned. We shall pursue this in the future. 
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